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We propose nonparametric methods for functional linear regres- 
sion which are designed for sparse longitudinal data, where both the 
predictor and response are functions of a covariate such as time. Pre- 
dictor and response processes have smooth random trajectories, and 
the data consist of a small number of noisy repeated measurements 
made at irregular times for a sample of subjects. In longitudinal stud- 
ies, the number of repeated measurements per subject is often small 
and may be modeled as a discrete random number and, accordingly, 
only a finite and asymptotically nonincreasing number of measure- 
ments are available for each subject or experimental unit. We propose 
a functional regression approach for this situation, using functional 
principal component analysis, where we estimate the functional prin- 
cipal component scores through conditional expectations. This allows 
the prediction of an unobserved response trajectory from sparse mea- 
surements of a predictor trajectory. The resulting technique is flexible 
and allows for different patterns regarding the timing of the mea- 
surements obtained for predictor and response trajectories. Asymp- 
totic properties for a sample of n subjects are investigated under 
mild conditions, as n — > oo, and we obtain consistent estimation for 
the regression function. Besides convergence results for the compo- 
nents of functional linear regression, such as the regression parame- 
ter function, we construct asymptotic pointwise confidence bands for 
the predicted trajectories. A functional coefficient of determination 
as a measure of the variance explained by the functional regression 
model is introduced, extending the standard to the functional 
case. The proposed methods are illustrated with a simulation study, 
longitudinal primary biliary liver cirrhosis data and an analysis of 
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the longitudinal relationship between blood pressure and body mass 
index. 

1. Introduction. We develop a version of functional linear regression 
analysis in which both the predictor and response variables are functions 
of some covariate which usually but not necessarily is time. Our approach 
extends the applicability of functional regression to typical longitudinal data 
where only very few and irregularly spaced measurements for predictor and 
response functions are available for most of the subjects. Examples of such 
data are discussed in Section 5 (see Figures 1 and 6). 

Since a parametric approach only captures features contained in the pre- 
conceived class of functions, nonparametric methods of functional data anal- 
ysis are needed for the detection of new features and for the modeling 
of highly complex relationships. Functional principal component analysis 
(FPCA) is a basic methodology that has been studied in early work by 
Grenander [18] and, more recently, by Rice and Silverman [27], Ramsay 
and Silverman [26] and many others. Background in probability on func- 
tion spaces can be found in [19]. James, Hastie and Sugar [21] emphasized 
the case of sparse data by proposing a reduced rank mixed-effects model 
using B-spline functions. Nonparametric methods for unbalanced longitudi- 
nal data were studied by Boularan, Ferre and Vieu [2] and Besse, Cardot 
and Ferraty [1]. Yao, Miiller and Wang [31] proposed an FPCA procedure 
through a conditional expectation method, aiming at estimating functional 
principal component scores for sparse longitudinal data. 

In the recent literature there has been increased interest in regression 
models for functional data, where both the predictor and response are ran- 
dom functions. Our aim is to extend the applicability of such models to lon- 
gitudinal data with their typically irregular designs, and to develop asymp- 
totics for functional regression in sparse data situations. Practically all inves- 
tigations to date are for the case of completely observed trajectories, where 
one assumes either entire trajectories or densely spaced measurements taken 
along each trajectory are observed; recent work includes Cardot, Ferraty, 
Mas and Sarda [3], Cardot, Ferraty and Sarda [5], Chiou, Miiller, Wang and 
Carey [7] and Ferraty and Vieu [16]. 

In this paper we illustrate the potential of functional regression for com- 
plex longitudinal data. In functional data settings, Cardot, Ferraty and 
Sarda [4] provided consistency results for the case of linear regression with a 
functional predictor and scalar response, where the predictor functions are 
sampled at a regular grid for each subject, and Cardot, Ferraty and Sarda 
[4] discussed inference for the regression function. The case of a functional 
response was introduced by Ramsay and Dalzell [25], and for a summary of 
this and related work we refer to Ramsay and Silverman ([26], Chapter 11) 
and to Faraway [15] for a discussion of relevant practical aspects. The theory 
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for the case of fixed design and functional response in the densely sampled 
case was investigated by Cuevas, Febrero and Praiman [8]. Chiou, Miiller and 
Wang [6] studied functional regression models where the predictors are finite- 
dimensional vectors and the response is a function, using a quasi-likelihood 
approach. Applications of varying-coefficient modeling to functional data, 
including asymptotic inference, were presented in [13] and [14]. 

The proposed functional regression approach is flexible, and allows for 
varying patterns of timing in regard to the measurements of predictor and 
response functions. This is relevant since it is a common occurrence in lon- 
gitudinal data settings that the measurement of either the predictor or re- 
sponse is missing. The contributions of this paper are as follows: First, we 
extend the functional regression approach to longitudinal data, using a con- 
ditioning idea. This leads to improved prediction of the response trajectories, 
given sparse measurements of the predictor trajectories. Second, we provide 
a complete practical implementation of the proposed functional regression 
procedure and illustrate its utility for two longitudinal studies. Third, we 
obtain the asymptotic consistency of the estimated regression function of the 
functional linear regression model for the case of sparse and irregular data, 
including rates. Fourth, we construct asymptotic pointwise confidence bands 
for predicted response trajectories based on asymptotic distribution results. 
Fifth, we introduce a consistent estimator for a proposed measure of associ- 
ation between the predictor and response functions in functional regression 
models that provides an extension of the coefficient of determination in 
standard linear model theory to the functional case. The proposed functional 
coefficient of determination provides a useful quantification of the strength 
of the relationship between response and predictor functions, as it can be 
interpreted in a well-defined sense as the fraction of variance explained by 
the functional linear regression model, in analogy to the situation for the 
standard linear regression model. 

The paper is organized as follows. In Section 2 we introduce basic no- 
tions, the functional linear regression model, and describe the estimation of 
the regression function. In Section 3 we discuss the extension of the con- 
ditioning approach to the prediction of response trajectories in functional 
regression under irregular and sparse data. Pointwise confidence bands and 
the functional coefficient of determination are also presented in Section 3. 
Simulation results that illustrate the usefulness of the proposed method can 
be found in Section 4. This is followed by applications of the proposed func- 
tional regression approach to longitudinal PBC liver cirrhosis data and an 
analysis of the longitudinal relationship between blood pressure and body 
mass index, using data from the Baltimore Longitudinal Study on Aging in 
Section 5. Asymptotic consistency and distribution results are provided in 
Section 6, while proofs and auxiliary results are compiled in the Appendix. 
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2. Functional linear regression for sparse and irregular data. 

2.1. Representing predictor and response functions through functional prin- 
cipal components. The underlying but unobservable sample consists of pairs 
of random trajectories {Xi,Yi), i = 1, . . . ,n, with square integrable predic- 
tor trajectories Xi and response trajectories Yi. These are reahzations of 
smooth random processes {X,Y), with unknown smooth mean functions 
EY{t) = EX{s) = nx{s), and covariance functions cov{Y{s),Y{t)) = 

Gy(s,t), cov(X{s), X{t)) = Gx{s,t). We usually refer to the arguments of 
X(-) and y(-) as time, with finite and closed intervals S and T as domains. 
We assume the existence of orthogonal expansions of Gx and Gy (in the 
sense) in terms of eigenfunctions and (pk with nonincreasing eigen- 
values pm and Afc, that is, Gx(si,S2) = E /Om^m('5i)^m(s2), t,s £ S, and 

Gy(tl,t2) = Y.k^k(t>k{h)(t>k{t2) ■, tl,t2 GT. 

We model the actually observed data which consist of sparse and irreg- 
ular repeated measurements of the predictor and response trajectories Xi 
and Yi, contaminated with additional measurement errors (see [28, 30]). To 
adequately reflect the irregular and sparse measurements, we assume that 
there is a random number of Lj (resp. Ni) random measurement times for 
Xi (resp. Yi) for the ith subject, which are denoted as Sn, . . . , SiLi (resp. 
Til, ■ ■ ■ ,TiXi)- The random variables Li and Ni are assumed to be i.i.d. as L 
and N respectively, where L and may be correlated but are independent 
of all other random variables. Let Un (resp. Vij) denote the observation of 
the random trajectory Xi (resp. Yi) at a random time Sn (resp. Tij), con- 
taminated with measurement errors en (resp. Cij), 1 < I < Li, 1 < j < Ni, 
1 < i < n. The errors are assumed to be i.i.d. with Een = 0, E[efi] = 
(resp. Eeij = 0, -E'[e?j] = cry), and independent of functional principal compo- 
nent scores Cim (resp. ^jfc) that satisfy EQm = 0, £'[CimCim'] = for m 7^ m' , 
E[CL] = Pm (resp. Eiik = 0, E[i,kiiw] = for A; 7^ k' , E[il] = \k). Then we 
may represent predictor and response measurements as follows: 

Xi{Sii) + en 

00 

P-x{Sii) + ^ Qim^m{Sii) + en, Sii£S,l<i<n,l<l<Li, 

m=l 



Tij £T,l<i<n,l<j < Ni. 

We note that the response and predictor functions do not need to be sam- 
pled simultaneously, extending the applicability of the proposed functional 
regression model. 




(2) 



V^j=Yi{Tij) + e 



00 



PviTij) + iik4>k{Tij) + eij, 

k=l 
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2.2. Functional linear regression model and estimation of the regression 
function. Consider a functional linear regression model in which both the 
predictor X and response Y are smooth random functions, 

(3) E[Y{t)\X]=a{t)+ [ (3{s,t)X{s)ds. 

Js 

Here the bivariate regression function (3{s, t) is smooth and square integrable, 
that is, /7-Xs/3^(s,t) dsdt < oo. Centralizing X{t) by X'^{s) =X{s) — fix{s), 
and observing £^[y(t)] = //y (t) = a{t) + /3(s, ds, the functional lin- 

ear regression model becomes 

(4) E[Yit)\X] = fivit) + ^ I3{s, t)X'{s) ds. 

Our aim is to predict an unknown response trajectory based on sparse 
and noisy observations of a new predictor function. This is the functional 
version of the classical prediction problem in a linear model where, given a 
set of predictors X, one aims to predict the mean response Y by estimating 
E{Y\X) (see [12], page 81). An important step is to estimate the regression 
function I3{s,t). We use the following basis representation of (3{s,t), which 
is a consequence of the population least squares property of conditional ex- 
pectation and the fact that the predictors are uncorrelated, generalizing the 
representation (3i = cov{X,Y)/ va,i{X) of the slope parameter in the sim- 
ple linear regression model E(Y\X) = /?o + PiX to the functional case. This 
representation holds under certain regularity conditions which are outlined 
in [20] and is given by 

(5) Pis,t)=f:f:^^^pus)Mt). 

k=lm.=l ^l^mi 

The convergence of the right-hand side of (5) is discussed in Lemma A. 2 
(Appendix A. 3). When referring to /3, we always assume that the limit (5) 
exists in an appropriate sense. In a first step, smooth estimates of the mean 
and covariance functions for the predictor and response functions are ob- 
tained by scatterplot smoothing; see (30) and (31) in Appendix A. 2. Then 
a nonparametric FPCA step yields estimates ipm, <Pk for the eigenfunctions, 
and pm, Afc for the eigenvalues of predictor and response functions; see (33) 
below. 

We use two-dimensional scatterplot smoothing to obtain an estimate 
C{s,t) of the cross-covariance surface C{s,t), s £ S, t £T, 

oo oo 

(6) C7(s,t) = cov(X(s),y(t)) = Y.Y1 E[CmCk]Ms)Mt)- 

k=l m=l 

Let Ci{Sii,Tij) = {Uii — jj-x{Sii)){Vij — fiyiTij)) be "raw" cross-covariances 
that serve as input for the two-dimensional smoothing step; see (36) in Ap- 
pendix A. 2. The smoothing parameters in the two coordinate directions can 
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be chosen independently by one-curve-leave-out cross-validation procedures 
[27]. From (6) we obtain estimates for akm = ^[Cm'^fc]) 



(7) 



^km = 1pm{s)C{s, t)(j)k{t) ds dt, 



m = l,...,M,k = l,...,K. 

With estimates (33), the resulting estimate for P{s,t) is 

K M ~ 

(8) Pis,t) = Y.Y.^^rnismt). 

k=lm=l P"^ 

In practice, the numbers M and K of included eigenfunctions can be 
chosen by one-curve-leave-out cross-validation (34), or by an AIC type cri- 
terion (35). For the asymptotic analysis, we consider M{n),K[n) ^ oo as 
the sample size n— > oo. Corresponding convergence results can be found in 
Theorem 1. 



3. Prediction and inference. 



3.1. Predicting response trajectories. One of our central aims is to pre- 
dict the trajectory Y* of the response for a new subject from sparse and 
irregular measurements of the predictor trajectory X* . In view of (4), the 
basis representation of (3{s,t) in (5) and the orthonormality of the {V'm}m>i) 
the prediction of the response function would be obtained via the conditional 
expectation 

oo oo 

(9) E[Y*{t)\X*]=Mt) + T.ll ^CMt), 

fc=lm=l 

where Cm — Isi-^*i'^) — fJ'x{s))'4'm{s) ds is the mth functional principal com- 
ponent score of the predictor trajectory X*. The quantities hy, 4>k, (^km and 
Pm can be estimated from the data, as described above. It remains to discuss 
the estimation of and for this step we invoke Gaussian assumptions in 
order to handle the sparsity of the data. 

Let Ui be the /th measurement made for the predictor function X* at 
time Si , according to (1), where / = !,... ,L*, with L* a random number. 
Assume that the functional principal component scores C,^ and the measure- 
ment errors for the predictor trajectories are jointly Gaussian. Following 
Yao, Miiller and Wang [31], the best prediction of the scores Cm is then 
obtained through the best linear prediction, given the observations U* = 
iUi, ■ ■ ■ , Ul,), and the number and locations of these observations, L* and 
S* = {SI, . . . , Slt)^ . Let X*{Si) be the value of the predictor function X* at 
time Si. Write X* = (X^Sl), . . .,X*{Sl,)f, = [^Jix{Sl), . . .,Px{Sl,)f 
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and V'm = (V'm('S'i), • • • ,^m(5'2.))"^. Then the best hnear prediction for 
is 

(10) c=pmrJ^dhu*-iJ^*x), 

where ^u* = cov{U*\L* , S*) = cov{X*\L* , S*) + aj^lL- , II- being the L* x 
L* identity matrix, that is, the (j, I) entry of the L* x L* matrix is 
{T,u')j,i = Gx{Sij,Sii) + aj^Sji with Sji = 1 if j = / and if j / ^• 

According to (10), estimates for the functional principal component scores 
are obtained by substituting estimates of ji^ , Pm and V'm that are based 
on the entire data cohection, leading to 

(11) Cm = PmrJ^u'{U*-(l*^), 

where (S(/«)ji = Gx{Sij, Su) + ar\5ji. The predicted trajectory is then ob- 
tained as 

KM-- 

(12) >lM(t) = /iy(t) + E E ^Cmkit). 

fc=lm=l 

In the sparse situation, the Gaussian assumption is crucial. It allows us 
to obtain the best linear predictors in (10)-(12) through conditional expec- 
tations, borrowing strength from the entire sample and thus compensating 
for the sparseness of data for individual trajectories. Simulation results, 
reported in Section 4, indicate that the proposed method is quite robust 
regarding the Gaussian assumption. Theoretical results for predicted trajec- 
tories (12) are given in Theorem 2. 

3.2. Asymptotic pointwise confidence hands for response trajectories. We 
construct asymptotic confidence bands for the response trajectory Y*{-) 
of a new subject, conditional on the sparse and noisy measurements that 
are available for the underlying predictor function. For M > 1, let C,*^ = 
{CI,...,CmV, = (Ci*,...,C|,)^, where C is as in (10), and define the 
MxL* matrix H = cov(C**^ U*\L*,S*) = {pii^l, . . .,pmiI^*mV- The covari- 
ance mjitrix of is cov{C^^\L* , S*) = HT^^Ih^ . Because = 

E[(^*'^^ \ U* , L* , S*] is the projection of C*^^ on the space spanned by the linear 
functions of ^7* given L* and S* , cov{C^^ - C^^\L* , S*) = cov{C^\L* , S*) - 
cov{C^'^\L*,S*) = CIm, where J7m = D-HJ:^Ih^, and D = diag{/>i, . . .,pm} 
Under the Gaussian assumption and conditioning on L* and 5*, then ^*^^ — 
C*-^~AA(0,Oa,). 

To construct pointwise bands for E[Y*{t)\X*] = PY{t) + T,m=i T,T=i ^km x 
'Pk{t)Qri/ Pm, let Q,M = D- H'E^Ih'^ , where D = diag{pi, . . . , pm} and H = 
(pit/jl, . . .,pmVmY- Define <t)tM = (</'i(t),. . .,4>M{t))'^ for t E T, and a. K x 
M matrix Pk,m = {c^km./ Pm)i<k<K,i<m<M- Let (j)tK-, Pk,m be the estimates 
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of (ptK, Prm obtained from the data. Write the prediction in the vector 
form Y^j^{t) = fiyit) + 4'fx^M,KC*^^ ■ Theorem 3 in Section 6 estabhshes 
that the asymptotic distribution of {i^^/ (i) — conditional on 

L* and S* can be approximated by AA(0, 4>fx^KM^MPKM^tK)- As a conse- 
quence, the (1 — a) asymptotic pointwise interval for £^[y*(i)|X*], the mean 
response at predictor level X* , is given by 

(13) nMt) ± '^(i - o^m^J^JAM^MPiu^tK, 

where $ is the standard Gaussian c.d.f. 



3.3. Coefficients of determination for functional linear regression. In stan- 
dard linear regression, a measure to quantify the "degree of linear associ- 
ation" between predictor and response variables is the coefficient of deter- 
mination i?^ (e.g., [12], page 138). The coefficient of determination plays 
an important role in applications of regression analysis, as it may be inter- 
preted as the fraction of variation of the response that is explained by the 
regression relation. Accordingly, B? is commonly used as a measure of the 
strength of the regression relationship. 

The proposed extension to functional linear regression can be motivated 
by the standard linear regression model with a response Y and a pre- 
dictor X, where the coefficient of determination B? is defined by = 
var(£^[y|X])/ var(y). This corresponds to the fraction of var(y) = 
var(£^[y|X]) + £^(var([y|X])) that is explained by the regression. In the 
functional setting, the regression function is given by (3), £'[y(t)|X] = 

/3(s, t)X(s) ds. To measure the global linear association between the func- 
tional predictor X and the functional response Y , we consider the total vari- 
ation of y explained by the regression function to be var(£'[y(t)|X]) dt, 
and by observing (9) and the orthonormality properties of {(/>fc} and {ipm}, 
one obtains 

„ oo 
(14) j Y^T{E[Y{t)\X])dt= ^IJPm. 

The analogous notion of total variation of Y is var(y (t)) dt = J^- Gy {t, t) dt - 
X^fc^i Afc. This motivates a functional version of B?' , 

(-^5) ^2 ^ Sr^^^{E[Y{t)\X])dt _ j:^m=i^lJPm 



Jrvar{Yit))dt Er=iAfc ' 

Since var(£'[y|X]) < var(y) for random variables X,Y, it follows that 
J^vaTiE[Yit)\X])dt < var(y(0) dt, that is, E^,fc=i ^L/P™ < E^Li h < 
oo. Thus, the functional B^ (15) always satisfies < i?^ < 1. 

Another interpretation of the functional coefficient of determination B^ 
(15) is as follows: Denoting by Bf.^^ the coefficients of determination of the 
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simple linear regressions of the functional principal component scores on 
Cm,'i-<k,m< oo, one finds = al^/ {pm^k), and Rl = Em=i Rim is the 
coefficient of determination of regressing ^j. on all m = 1, 2, . . . , simul- 
taneously, by successively adding predictors Cm into the regression equation 
and observing that i?| is obtained as the sum of the R\^, as the predic- 
tors Cm are uncorrelated. Then R^ = Y.t=i\kRl/ {Y.f=i\k) 

is seen to be a 

weighted average of these R\, with weights provided by the A^. According 
to (15), a natural estimate R^ for the functional coefficient of determination 
R^ is 

^-|^g-j _ Hk=l I]m=l ^kml Prn 

Y.k=l ^k 

where are as in (7). 

Besides the functional R^ (15) that provides a global measure of the 
linear association between processes X and Y , we also propose a local ver- 
sion of a functional coefficient of determination. The corresponding function 
R?'{t) may be considered a functional extension of the local R? measure 
that has been introduced in [10] and [11]. As shown above, for fixed t , 
the variation of Y{t) explained by the predictor process X is determined 
by var(i?[y(t)|X])/var(y(t)). This motivates the following definition of a 
pointwise functional coefficient of determination R?{t): 

_ var{E[Y{t)\X]) _ Y.m=iY.T/=i'^km(ye.mMt)Mt) I Pm 



^^^^ ^^'^ ^^r{Y{t)) Er=ihm 

Note that R'^{t) satisfies 0<R^{t)<l for all t G T. 

A second option to obtain an overall R^ value is to extend the pointwise 
measure R'^(t) to a global measure by taking its integral, which leads to an 
alternative definition of the global functional coefficient of determination, 
denoted by i?^, 

J_ /• var(g[y(t)|X]) 



\T\Jt var(y(t)) 
(18) 

1 f 'Em=lT,k!e=l^km(^im4>k{t)4'e{i)/Pm ,^ 

■at. 



where \T\ denotes the length of the time domain T . Natural estimates of 
R?{t) and R^ are given by 

Ef=iAfc.^i(i) 

,^ 1 /" Sm=l Y^kl=l ^km^lm(kk 



(20) R =7^ ^"=^^"'^=V dt. 
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We refer to Section 5 for further discussion of B? , R'^i't) and in applica- 
tions and to Theorem 4 in Section 6 regarding the asymptotic convergence 
of these estimates. 

4. Simulation studies. Simulation studies were based on 500 i.i.d. normal 
and 500 i.i.d. mixture samples, where 100 pairs of response and predictor 
trajectories were generated for each sample. Emulating very sparse and ir- 
regular designs, each pair of response and predictor functions was observed 
at different sets of time points. The number of measurements was randomly 
chosen for each predictor and each response trajectory, with equal probabil- 
ity from {3,4,5} for Xi uniformly, and independently chosen from {3,4,5} 
for Yi uniformly, also with equal probability. This setup reflects very sparse 
designs with at most five observations available per subject. Once their num- 
bers were determined, the locations of the measurements were uniformly 
distributed on [0, 10] for both Xi and Yi, respectively. 

The predictor trajectories Xi and associated sparse and noisy measure- 
ments Uii (1) were generated as follows. The simulated processes X had the 
mean function Hxis) = s-l-sin(s), with covariance function constructed from 
two eigenf unctions, ipiis) = — cos(7rs/10)/-v/5 and '02(s) = sin(7rs/10)/\/5, 
< s < 10. We chose pi = 2, p2 = 1 and pm = 0, m > 3, as eigenvalues, and 
o"y = 0.25 as the variance of the additional measurement errors en in (1), 
which were assumed to be normal with mean 0. For the 500 normal samples, 
the FPC scores Cim {m = 1,2) were generated from M{0,pm), while the Cim 
for the nonnormal samples were generated from a mixture of two normals, 
■^{\/Pm/2, Pm/2) with probability 1/2 and AA(— Prn/2), also with 
probability 1/2. 

For the response trajectories, letting bu = 2, 612 = 2, 621 = 1, 622 = 2, the 
regression function was (3{s,t) = Efc=i Em=i ^fcmV'm(s)V'fc(i), t,s e [0,10], 
and the response trajectories were £'[yj(t)|Xj] = Jq^ l3{s,t)Xi{s) ds. Only 
the sparse and noisy observations Vij = E[Yi{Tij)\Xi] + eij (2) were available 
for response trajectories, contaminated with pseudo-i.i.d. errors €ij with den- 
sity AA(0, 0.1). 

We investigated predicting response curves for new subjects. For each 
Monte Carlo simulation run, we generated 100 new predictor curves X*, 
with noisy measurements taken at the same random time points as Xi, and 
100 associated response curves £^[1^*|X*]. Relative mean squared prediction 
error was used as an evaluation criterion, given by 



where predicted trajectories Y*j^j^j were obtained according to (11) and (12). 
This method is denoted as CE in Table 1, and was compared with a "clas- 
sical" functional regression approach that was also based on (12), but with 



(21) 
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the conditional expectation replaced by the integral approximation = 

Eflim - fix{S*i))^m{S*,){S*i - Sli_,), denoted by IN in Table 1. The 
numbers of eigenfunctions K and M were chosen by the AIC criterion (35), 
separately for each simulation. We also included the case of irregular but 
nonsparse data, where the random numbers of the repeated measurements 
were chosen from {20, . . . , 30} for both Xi and Yi with equal probability. 
From the results in Table 1, we see that, for sparse data, the CE method im- 
proves the prediction errors by 57%/60% for normal/mixture distributions, 
while the gains for nonsparse data are not as dramatic, but nevertheless 
present. The CE method emerges as superior for the sparse data case. 

5. Applications. 

5.1. Primary biliary cirrhosis data. Primary biliary cirrhosis [23] is a 
rare but fatal chronic liver disease of unknown cause, with a prevalence of 
about 50 cases per million population. The data were collected between Jan- 
uary 1974 and May 1984 by the Mayo Chnic (see also Appendix D of [17]). 
The patients were scheduled to have measurements of blood characteristics 
at six months, one year and annually thereafter post diagnosis. However, 
since many individuals missed some of their scheduled visits, the data are 
sparse and irregular with unequal numbers of repeated measurements per 
subject and also different measurement times Tij per individual. 

To demonstrate the usefulness of the proposed methods, we explore the 
dynamic relationship between albumin in mg/dl (predictor) and prothrom- 
bin time in seconds (response), which are both longitudinally measured. We 
include 137 female patients, and the measurements of albumin level and pro- 
thrombin time before 2500 days. For both albumin and prothrombin time, 
the number of observations ranged from 1 to 10, with a median of 5 mea- 
surements per subject. Individual trajectories of albumin and prothrombin 
time are shown in Figure 1. 

Table 1 

Results of 500 Monte Carlo runs with n — 100 trajectories 
per sample. Shown are medians of observed squared 
prediction errors, RMSPE (21). Here CE is the 
conditional expectation method (11), (12) and IN stands 
for integral approximation 



Regression 




Normal 


Mixture 


Sparse 


CE 


0.0083 


0.0081 


IN 


0.0193 


0.0204 


Nonsparse 


CE 


0.0057 


0.0062 


IN 


0.0078 


0.0079 
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The smooth estimates of the mean function for both albumin and pro- 
thrombin time are also displayed in Figure 1, indicating opposite trends. The 
AIC criterion leads to the choice of two eigenfunctions for both predictor 
and response functions, and smooth eigenfunction estimates are presented 
in Figure 2. For both albumin and prothrombin time, the first eigenfunction 
reflects an overall level, and the second eigenfunction a contrast between 
early and late times. The estimate of the regression function /? is displayed 
in Figure 3. Its shape implies that, for the prediction of early prothrombin 
times, late albumin levels contribute positively, while early levels contribute 
negatively, whereas the prediction of late prothrombin times is based on a 
sharper contrast with highly positive weighting of early albumin levels and 
negative weighting of later levels. 

We randomly selected four patients from the sample for which the tra- 
jectory of prothrombin time was to be predicted solely from the sparse and 
noisy albumin measurements. For this prediction, the data of each subject to 
be predicted were omitted, the functional regression model was fitted from 
the remaining subjects, and then the predictor measurements were entered 
into the fitted model to obtain the predicted response trajectory, thus lead- 
ing to genuine predictions. Predicted curves and 95% pointwise confidence 
bands are shown in Figure 4. Note that these predictions of longitudinal 
trajectories are based on just a few albumin measurements, and the pro- 
thrombin time response measurements shown in the figures are not used. 

Regarding the functional coefficients of determination B? (15) and 

(18), we obtain very similar estimates, = 0.37 and R = 0.36, which we 
would interpret to mean that about 36% of the total functional variation 




Fig. 1. Left panel: Observed individual trajectories (solid ) and the smooth estimate of the 
mean function for albumin (thick solid). Right panel: Corresponding observed individual 
trajectories (solid) and the smooth estimate of the mean function for prothrombin time 
(thick solid), for the primary biliary cirrhosis (PBC) data. 
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Fig. 2. Left panel: Smooth estimates of the first (solid ) and second (dashed ) eigenfunc- 
tions for albumin, accounting for 87% and 8% of total variation. Right panel: Smooth 
estimates of the first (solid) and second (dashed) eigenfunctions for prothrombin time, 
accounting for 54% and 33% of total variation, for the PBC data. 
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Fig. 3. Estimated regression function (8), where the predictor (albumin) time is s (in 
days), and the response (prothrombin) time is t (in days), for the PBC data. 
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of the prothrombin time trajectories is explained by the albumin data, in- 
dicating a reasonably strong functional regression relationship. The curve 
of estimated pointwise coefficients of determination (19) is shown in 

Figure 5, left panel, which describes the trend of the proportion of the vari- 
ation of the prothrombin time, at each argument value, that is explained by 
the entire albumin trajectories. We find that the observations in the second 
half are better determined by the albumin trajectories than the values in 
the first half of the domain of prothrombin time. 

5.2. Functional regression of systolic blood pressure on body mass index. 
As a second example, we discuss a functional regression analysis of sys- 
tolic blood pressure trajectories (responses) on body mass index trajectories 
(predictor), using anonymous data from the Baltimore Longitudinal Study 
of Aging (BLSA), a major longitudinal study of human aging [29]. The data 
consist of 1590 male volunteers who were scheduled to visit the Gerontology 
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Fig. 4. Observed values (circles) for prothrombin times (not used for prediction), pre- 
dicted curves (solid) and 95% pointwise bands (dashed), for four randomly selected pa- 
tients, where bands and predicted curves are based on one- curve-leave- out analysis and 
sparse noisy albumin measurements (not shown) of the predictor function. 
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Fig. 5. Estimated pointwise coejficient of determination R^{t) (19) as a function of time 
t for the PBC data (left panel, time is duration of study in days) and for the BLSA data 
(right panel, time is age in years). 

Research Center bi-annually. Time corresponds to age of each subject and 
is measured in years. On each visit, systohc blood pressure (in mm Hg) and 
body mass index (BMI = weight in kg/height in m^) were assessed. Since 
both measurements are highly variable, the data are noisy, and as many par- 
ticipants missed scheduled visits, or were seen at other than the scheduled 
times, the data are sparse with largely unequal numbers of repeated mea- 
surements and widely differing measurement times per subject. More details 
about the study and data can be found in [24], and for previous statistical 
approaches, we refer to [22]. 

We included the participants with measurements within the age range 
[60,80], and first checked for outliers based on standardized residuals of 
body mass index (BMI) and systolic blood pressure (SEP), respectively. The 
standardized residuals are defined as residuals divided by the pooled sample 
standard deviation, where residuals are the differences between measure- 
ments and the estimated mean function obtained by scatterplot smoothing, 
using the local linear smoother. We excluded subjects with standardized 
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residuals larger (or less) than ±3, for either BMI or SBP. Individual trajec- 
tories of BMI and SBP for the included 812 subjects are shown in Figure 6, 
along with the smooth estimated mean functions of BMI and SBP. 

While average BMI decreases after age 64, SBP throughout shows an 
increasing trend. Based on the AIC criterion, three eigenfunctions are used 
for the predictor (BMI) function, and four for the response (SBP) function; 
these are displayed in Figure 7. The first eigenfunctions of both processes 
correspond to an overall mean effect, and the second eigenfunctions to a 
contrast between early and late ages, with further oscillations reflected in 
third and fourth eigenfunctions. 

The estimated regression function in Figure 8 indicates that a contrast 
between late and early BMI levels forms the prediction of SBP levels at later 
ages, where late BMI levels are weighted positive and early levels negative. 
When predicting SBP at age 80, the entire BMI trajectory matters, and 
rapid overall declines in BMI lead to the lowest SBPs, where speed of de- 
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Fig. 6. Left panel: Observed individual trajectories (solid ) and the smooth estimate of 
the mean function (thick solid) for body mass index (BMI). Right panel: Corresponding 
observed individual trajectories (solid ) and the smooth estimate of the mean function (thick 
solid ) for systolic blood pressure (SBP), for the BLSA data. 
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cline between 60 and 65 and between 75 and 80 is critical. Similar patterns 
can be identified for predicting SBP at other ages. As in tfie previous exam- 
ple, we randomly select four study participants and obtain predictions and 
95% pointwise bands for each of these, based on one-leave-out functional 
regression analysis (Figure 9). The predicted trajectories are found to be 
reasonably close to the observations, which are not used in the analysis. 

The functional coefficients of determination (15) and (18) were 
both estimated as 0.13, indicating that the dynamics of body mass index 
explains 13% of the total variation of systolic blood pressure trajectories; 
the functional regression relationship is seen to be weaker than in the previ- 
ous example. The curve of estimated pointwise coefficients of determination 
R^{t) (17) is displayed in Fi gure 5, right panel, indicating generally weaker 
linear association at older ages (beyond 70 years) as compared to the ear- 
lier ages (60 to 70 years). The minimal linear association between predictor 
trajectories and the functional response is seen to occur around age 75.7. 



6. Asymptotic properties. In this section we present the consistency of 
the regression function estimate (8) in Theorem 1. The consistency of pre- 
dicted trajectories is reflected in Theorem 2, and the asymptotic distribu- 
tions of predicted trajectories in Theorem 3. The proposed functional co- 
efficient of determination R^ (16) is shown to be consistent for R^ (15) in 
Theorem 4. 

In what follows, we only consider the case that the processes X and 
Y are infinite-dimensional. If they are finite-dimensional and there exist 
true finite K and M, such that Gx and Gy are finite-dimensional surfaces. 
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Fig. 7. Left panel: Smooth estimates of the first (solid), second (dashed) and third 
(dash-dot) eigenfunctions for BMI, accounting for 90%, 6% and 3% of total variation. 
Right panel: Smooth estimates of the first (solid), second (dashed), third (dash-dot) and 
fourth (dotted) eigenfunctions for SPB, accounting for 76%, 15%, 4% and 2% of total 
variation, for the BLSA data. 
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Fig. 8. Estimated regression function (8), where the predictor (BMI) time is s (in years), 
and the response (SBP) time is t (in years), for the BLSA data. 



then, as n tends to infinity, remainder terms such as On and in (41) 
win disappear. Appropriately modified versions of the following theoretical 
results still hold. Since in this case the true K and M are finite, most of the 
technical assumptions that are needed to handle the infinite case would not 
be needed, such as (Al)-(A3), (A6)-(A7) and (B5), and K{n),M{n) would 
be assumed to converge to the true K and M instead of infinity, as n — > oo. 

To define the convergence of the right-hand side of (5) in the sense, 
we require that 

(Al) Er=lE,n=l^L,/P^<00. 

Furthermore, the right-hand side of (5) converges uniformly on 5 x T, pro- 
vided that 

(A2) 7(s,t) =EfcliEm=i Wkm'ipmis)4'k(t)\/pm IS coutinuous in s and i, and 
the function Pkm{s, t) = J2k=i Em=i '^kmipm.{s)(l)k{t) / Pm absolutely con- 
verges to /3(s, t) for all s G 5, t G T as M, K ^ oo. 

The numbers M = M[n) and K = K{n) of included eigenfunctions are 
integer-valued sequences that depend on the sample size n; see assumption 
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Fig. 9. Observed values (circles) of systolic blood pressure (SBP) that are not used for the 
prediction, predicted curves (solid ) and 95% pointwise bands (dashed ), for four randomly 
selected patients, where bands and predicted curves are based solely on sparse and noisy 
measurements of the predictor function (BMI, not shown). 



(B5) in Appendix A.l. For simplicity, we suppress the dependency of M and 
K on n in the notation. The consistency of /3 (8) is obtained as follows. 

Theorem 1. Under (Al) and the assumptions of Lemma A.l and (B5) 
(see Appendix A.l J, 

(22) lim / / t) - f3is, t)]^ dsdt = in probability, 

n^oo J J- 

and if (Al) is replaced with (A2), 

(23) lim sup \P{s,t) — f3{s,t)\ = in probability. 

"~*°°(s,t)e5xr 

The rate of convergence in (22) and (23) depends on specific properties 
of processes X and Y in the following way: If Tn,Vn and are defined as in 
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(B.5) of Appendix A.l, and 'dn are defined as in (41) in Appendix A. 2, 
then, for (22) we obtain the rate 

j^jlKs, t) - Pis, t)f ds dt = Op{Tn + Vn + <^n + On), 

and for (23) the rate is 

sup |/3(s,t) - P{s,t)\ = Op (r„ + !;„ + ?„ + ■(?„,), 
{s,t)esxr 

as n-^ oo. Here <^n depends on bandwidths hi and /i2 that are used in the 
smoothing step (36) for the cross-covariance function C(s, t) = cov {X{s),Y{t)) 
These rates depend on specific properties of the processes X and Y, such as 
the spacings of the eigenvalues of their autocovariance operators. We note 
that, due to the sparsity of the data (at most, finitely many observations 
are made per random trajectory), fast rates of convergence cannot be ex- 
pected in this situation, in contrast to the case where entire trajectories are 
observed or are densely sampled. 

RecalUhat Y^j^,j{t) = ^vit) + Ef=i Em=i (^km4'k{t)Cm/ Pm, arid the pre- 
diction Y^Mit) = Prit) + J2k=iT>m=i^kmci>k{t)Cm/Pm, where Cm is as in 
(ll).J)efine the target trajectory Y*{t) = /i(t) + Efcli Em=i (^kmCmMt) / Pm 
and Y*j^.j = fi{t) + J2k=i Em=i (^kmC^Mt)/ Pm., where C is defined in (10). 
Assume that 

(A3) Er=iEm=i^L/(AfcP,n)<oo. 

Then E[Y*{t)\X*] and Y* are defined as the limits of Y^m(0 and Y^m^*) 
in the sense; see Lemma A. 3 in Appendix A. 3. Furthermore, we assume: 

(A4) The number and locations of measurements for a given subject or 
cluster remain unaltered as the sample size n — > oo. 

Theorem 2. Under (A3), (A4) and the assumptions of Lemma A.l and 
(B5) (see Appendix A.l), given L* and S* , for all t£T, 

(24) lim Y^,.(t) =y*(t) in probability, 

n — >oo 

This provides the consistency of the prediction Y^^^- for the target tra- 
jectory Y* . 

For the following results, we require Gaussian assumptions. 

(A5) For all 1 < i < n, m > 1 and I <l < Li, the functional principal com- 
ponent scores dm and the measurement errors en in (1) are jointly 
Gaussian. 
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Define LOKM{tiM) = (pfiRPKM^AiPKM^'tiK , for ti, t2 G T. Then iOKM{ti,t2) 
is a sequence of continuous positive definite functions, and CjKM{ii-,t2) = 
(j)f_^j^PKM^MPKM^t2K is an estimate of iOKAi{ii,t2)- We require existence 
of a limiting function and, therefore, the following analytical condition: 

(A6) There exists a continuous positive definite function uj{ti,t2) such that 
i^KM{ti,t2) ^ i^{ti,t2) as K,M ^oo. 

We obtain the asymptotic distribution of ~ E[Y*{t)\X*]} as follows, 

providing inference for predicted trajectories. 

Theorem 3. Under (A3)-(A6) and the assumptions of Lemma A.l and 
(B5) (see Appendix A.l), given L* and S* , for all t gT, x € 

(25) limp|aMW^£™n<4 = ^(.). 

Considering the measure (15), R^ is well defined since J2m k=i ^mk/ Pm < 
J2k^i ^k < OO. Furthermore, the right-hand side of (17) uniformly converges 
on t G T, provided that 

(A7) K{t) = E-m=i Em=i Wkmcr£m4>k{t)(j)eit)\/ Pm is continuous in t e T, and 
the function i?M^(t) = Em=i J2k,e=i(^kmCrim4>k{t)(p£{t) / pm absolutely 
converges to R'^{t) for all t (zT . 

The consistency of R"^ (16), R'^{t) (19) and R (20) is obtained as a conse- 
quence of Lemma A.l, (A7) and (B5). 

Theorem 4. Under the assumptions of Lemma A.l and (B5) (see Ap- 
pendix A.l ), 

(26) lim R^ = R? in probability, 

n— >oo 

and i/ (A7) is assumed, 

(27) lim sup I ^ 2 (t) - i?^ (t ) I = in probability, 

^2 

(28) lim R = R^ in probability. 

We note that the rate of convergence in (26) is the same as in the remark 
after Theorem 1, and that the rate in (27) and (28) is given by Op{Tn + Vn + 
?n +'^n), where 7r„ is as in (29). 
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7. Concluding remarks. The functional regression method we are propos- 
ing apphes to the situation where both predictors and responses are curves. 
Sparse data situations will occur also in other functional regression situa- 
tions where responses could be functions and predictors vectors, or where 
responses are scalars and the predictors are curves. The ideas presented here 
can be extended to such situations. 

The approach to functional regression we have proposed is quite flexi- 
ble, and in simulations is seen to be robust to violations of assumptions 
such as the Gaussian assumption. It is a useful tool for data where both 
predictors and responses are contaminated by errors. We refer to [9] for an- 
other approach and discussion of de-noising of single pairs of curves observed 
with measurement errors. Besides varying coefficient models, the available 
methodology for situations where one has a sample of predictor and response 
functions is quite limited. Common methods in longitudinal data analysis 
such as Generalized Estimating Equations and Generalized Linear Mixed 
Models are not suitable for this task. 

The proposed methodology may prove generally useful for longitudinal 
data with missing measurements, where missingness would be assumed to 
be totally unrelated to the random trajectories and errors. Extensions to 
situations where missingness is correlated with the time courses would be 
of interest in many practical applications. There are also some limitations 
to the functional regression approach under sparse data. Our prediction 
methods target the trajectory conditional on the available data, while the 
response trajectory given the entire but unobservable predictor trajectory 
is not accessible. While in theory it is enough that the probability that one 
observes more than one measurement per random trajectory is positive, in 
practice there needs to be a substantial number of subjects with two or more 
observations. 

Sometimes a prediction for a response trajectory may be desired even 
if there is no observation at all available for that subject. In this case we 
predict the estimated mean response function as the response trajectory. 
This is not unreasonable, since borrowing strength from other subjects to 
predict the response trajectory for a given subject is a key feature of the 
proposed method that will come more into play for subjects with very few 
measurements. Predictions for these subjects will often be relatively closer 
to the mean response than for subjects with many measurements. 

We conclude by remarking that extensions to cases with more than one 
predictor function are of interest in a number of applications, and would 
be analogous to the extension of simple linear regression to multiple linear 
regression. Functional regression is only at its initial stages and much more 
work needs to be done. 
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APPENDIX 

A.l. Assumptions and notation. The data {Su^Uu) and {Tij,Vij), 
i = I, . . . ,n, I = 1, . . . , Li, j = 1, . . . , Ni, as described in (1) and (2), are as- 
sumed to liave tlie same distribution as (5, U) and (T, V), with joint densities 
gi{s,u) and g2it,v). Assume that the observation times Su are i.i.d. with 
marginal density Dependence is allowed for observations Uu-^ and Uu^ 

made for the same subject or cluster, and analogous properties hold for Vij, 
where Tij are i.i.d. with marginal density frit)- We make the following as- 
sumptions for the number of observations Lj and Ni that are available for 
the ith subject: 

(Bl.l) The number of observations Lj and Ni made for the ith subject or 
cluster are random variables such that Lj L, Ni N, where L 
and N are positive discrete random variables, with P{L > 1) > and 
P(iV> 1) >0. 

The observation times and measurements are assumed to be independent of 
the number of measurements, that is, for any subsets vCj C {1, . . . , Lj} and 
i7i C {1, . . . , Ni}, and for all i = 1, . . . , n, 

(B1.2) {{Su : / G Ci},{Uii : I £ Ci}) is independent of Li, and {{Tij : j G Ji}, 
{Yij :j G i7j}) is independent of Ni. 

Let Ki{-) and K2{-,-) be nonnegative univariate and bivariate kernel 
functions that are used in the smoothing steps for the mean functions ^ix, 
fly, covariance surfaces Gx, Gy, and cross-covariance structure G. Assume 
that Ki and K2 are compactly supported densities with zero means and fi- 
nite variances. Let bx = bx{n), by = by{n), hx = hx{n), hy = hY{n) be 
the bandwidths for estimating fix and fiy (30), Gx and Gy (31), and 
hi = hi{n), h2 = h2{n) be the bandwidths for obtaining C (36). We develop 
asymptotics as the number of subjects n ^ 00, and require the following: 

(B2.1) bx — > 0, 6y — > 0, nb\ — > 00, nby — > 00, nb^ < 00 and nby < 00. 
(B2.2) hx — > 0, hy ^ 0, n/ij^ — > 00, nhy — > 00, n/i ^ < cxd and nhy < 00. 
(B2.3) Without loss of generality, /ii//i2 — > 1, n^hf — > 00 and nhf < 00. 

Define the Fourier transformations of Ki (u) , K2 {u,v) hy Ki{t) = J e~*"*i^i (n) du 
and K2{t,s) = J e~^^^^^'^^^^ K2{u,v) dudv. They satisfy the following: 

(B3.1) Ki{t) is absolutely integrable, that is, / |Ki(t)| dt < 00. 

(B3.2) K2{t,s) is absolutely integrable, that is, / / \K2{t,s)\ dtds < 00. 

Assume that the fourth moments of y and U, centered at ij-y(T) and iJ,x{S), 
are finite, that is, 

(B4) E[{Y - /uy (r))4] < 00, E[{U - fixiS))^] < 00. 
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Let Si and S2 be i.i.d. as S, and Ui and U2 be the repeated measure- 
ments of X made on the same subject, taken at Si and 52 separately. 
Assume {Sii-^^Su^^Uii^^Uu^)^ ^ < h ^ h ^ Li, is identicahy distributed as 
(5i, C/i, C/2) with joint density function gx{si,S2, U2), and analogously 
for {Tij^,Tij^,Vij^,Vij^) with identical joint density function (^y (ti, t2; ^^i, '^2)- 
Appropriate regularity assumptions are imposed for the marginal and joint 
densities, fs{s), frit), giis,u), g2(t,v), gx{si, S2,ui,U2) and gY{ti,t2,vi,V2). 

Define the rank one operator f ^ g = {f, h)y, for f,h£H, and denote the 
separable Hilbert space of Hilbert-Schmidt operators on by = a2{H), 
endowed by {Ti,T2)f = tr(Tir2*) = Y.j{TiUj,T2Uj) h and ||T||^ = {T,T)f, 
where Ti,T2,T £ F, and {uj : j > 1} is any complete orthonormal system in 
H. The covariance operator Gx (resp. Gx) is generated by the kernel Gx 
(resp. Gx), that^is, Gx{f) = JsGx{s,t)f{s) ds, G{f) = JsGx{s,t)f{s) ds. 
Then Gx and Gx are Hilbert-Schmidt operators, and Theorem 1 in [31] 

implies that ||Gx — Gx||f = Op{l / {^/nhx)) ■ 

Let li = {j : pj = Pi}, I' = {i: \Ii\ = 1}, where |Xj| denotes the number 

of elements in Tj. Let Pf = J2kei, V'fc ® tpk, and Pf = J2mei, V'm O tpm 
denote the true and estimated orthogonal projection operators from H to 
the subspace spanned by {ipm :m£lj}. For fixed j, let 

5f = ^mm{\pi - pj\:l^Ij}, 

and let A^x = {z C :\z — pj \ = 6f}, where C stands for the set of complex 

numbers. The resolvent of Gx (resp. Gx) is denoted by Rx (resp. Rx), 
that is, Rx(^) = (Gx - zl)''^ [resp. Rx(^) = (Gx - zl)''^]. Let 

A^x =sup{||Rx(2)||f:2 G A^x}, 



3 



and analogously define sequences {(5- } and {^4^^} for the response process 



Y. We assume that the numbers M = M{n) and K = M{n) of included 
eigenfunctions depend on the sample size n, such that as n — > 00, if M{n) — > 
00 and K{n) — > 00, 

m=l ''A 



= E ^ 



; Vnhl - Aq 
0. 



KM 



^/n/ii/i2 

The main effect of this assumption is to impose certain constraints on the 
rate of K and M in relation to n and the bandwidths. 



FUNCTIONAL REGRESSION ANALYSIS 



25 



We denote the remainder in (A7) by 



(29) Tin = sup 



m=M(n) k/=K(n) 



A. 2. Estimation procedures. We begin by summarizing the estimation 
procedures for the components of models (1) and (2) in the fohowing; see 
Yao, Miiller and Wang [31] for further details. Define the local linear scat- 
terplot smoothers for fJ,x{s) through minimizing 

" / Q., _ s\ 



(30) E E ^1 [^t^J i^^^ - ^o" - Pt is - Su)r, 

with respect to (3^ , P^- , leading to p,x{s) = $o{s). 

Let Gf{Sii^,Sii2) = (Uii^ - p'x{Sii^))(Uii2 -fixiSu^)), and define the local 
linear surface smoother for Gx{si,S2) through minimizing 

(Sii^—si 811^ — 82" 

E M u ^—f— 

(31) i=ii<h^h<U ^ '^^ 

where /(/3^, (si, S2), (^iZi , ^j^J) = f3^ + Pu{si - Su^) + f3^2{s2 - Su^), with 
respect to = (/3^ , /3i^), yielding S2) = /3^(si, S2). 

For the estimation of cr^, we fit a local quadratic component orthogonal 
to the diagonal of Gx, and a local linear component in the direction_of the 
diagonal. Denote the diagonal of the resulting surface estimate by Gx{s), 
and a local linear smoother focusing on diagonal values {Gx{s,s) + a^} 
by Vxis). Let ax = inf{s : s E 5}, bx = sup{s : s E 5}, |5| = bx — ax, Si = 
[ax + \S\/A,bx - |5|/4]. The estimate of aj^ is 

(32) al=2 f {Vx{s)-Gx{s)}ds/\Sl 

if > and aj^ = otherwise. 

The estimates of {pm,'4'm}m>i correspond to the solutions {pm,V'm}m>i 
of the eigenequations 

(33) Gx{si,S2)lpmisi)dsi = pmi! 



S 

with orthonormal constraints on {ipm}m>i- 

Let jlj^ *^ and i^m be the estimated mean and eigenfunctions after remov- 
ing the data for Xi. One-curve- leave-out cross-validation aims to minimize 

n Li 

(34) C^/x{M)=Y^Y.iU^l-Xt'\Su)f 

i=l 1=1 



(35) 
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with respect to M, where X^"^ (s) = /i^'^ (s) +E^=i Ci;;'^^"'^ (s) , and ct'^ 
is obtained by (11). The AIC criterion as a function of M is given by 

" f 1 /- . . V 

AIC(M) =Y^}-^iu^-fLX,-Y^ GmV'im 
j=l I ^ \ m=l / 

^ . . \ 
\ m=l I 

+ :^log(27r) + :|log4-|+M, 

where tji = {Un, . ..,UiL^Y, fix, = (iJ-xiSn), . . . , fix{SiL,)Y , ipim = (V'm(5'ii), 
. . . jipmiSiL^)'^ , and (im is obtained by (11). We proceed analogously for the 
corresponding estimates for the components of model (2) regarding the re- 
sponse process Y. 

The local linear surface smoother for the cross-covariance surface C{s,t) 
is obtained through minimizing 

(36) EEE^^p^,^^ {c.(5,,,r,,)-/(/3,(.,t),(5,,,r,,))}2 

i=lj=ll=l \ '^1 "2 / 

with regard to /3 = (/?0) Ai) /3i2)5 leading to C(s,t) = /3o('S,i)- 

A. 3. Preliminary consistency results. Applying Theorems 1 and 2 of [31], 
we obtain uniform consistency of the estimate of the mean functions, covari- 
ance functions, eigenvalues and eigenfunctions of the predictor and response 
processes. Under assumption (A2.3), this extends to the cross-covariance 
function. 

Lemma A.l. Under (B1.1)-(B5), and appropriate regularity assump- 
tions for fs{s), frit), gi{s,u), g2(t,v), gx{si,S2,ui,U2) and gY(ti,t2,vi,V2), 

(37) \pm — Pm\ =0p[ 2 — I , I Afc - Afcl = Op ^ 



Considering eigenvalues pm and Afc of multiplicity one respectively, ipm o.nd 
can be chosen such that 
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and, furthermore, 

(39) sup \C{s,t)-C{s,t)\=o/ ^ 



{s,t)esxr \^/nhih2 
As a consequence of (38) and (39), 



(40) \crkm - CTfcml = Op ( max<^ 



°k 1 



i ^/nh\ — Agx ' y/nhy — A^y ' ■y/n/ii/i2 

where the Op{-) terms in (37), (38) and (40) /loZd uniformly over all k and 
m. 

Proof. Part of the proof follows that of Theorem 2 in [31]. Additional 
arguments are needed to obtain the convergence rates in (38). Since Rx(-z) = 
Kxiz)[I+iGx-Gx)Rxiz)r' = Rx{z)EfZo[{Gx-Gx)B.x{z)]\ \\Rx{z)- 
nx{z)\\F < {\\Gx - Gx||f||Rx(2)||f)/(1 - ||Gx - Gx||f||Rx(2)||f). Note 
that Pf = (27ri)-i /. Rx(^) dz, Pf = {2m)-'^ /. Kx{z) dz. Therefore, 

J J 

||Pf-Pf||F< / \\Rx{z)-Rx{z)\\Fdz/{27r) 



<6f ^ 

1 - ||Gx - GxWfA^x 

6fAsx 

< 



y/nh\ - A^x 

Applying the arguments used in the proof of Theorem 2 in [31] leads to 
(37) and (38). Under (A2.3), the uniform convergence of C is an extension 
of Theorem 1 in [31]. Then (40) follows by applying (38) and (39). □ 

Lemma A. 2. Under (Al) the right-hand side of (5) converges in the 
sense. Furthermore, under (A2) the right-hand side of (5) converges 
uniformly on S x T. 

Proof. Let pKM{s,t) = Y.k=iT.m=i^km'4^m{s)<i)k{t)/ pm- Observing 
the orthonormality of {'0m}m>i and {(/>fc}fc>i, Jt Is f^KAli^''''^) dt = 
J2k=iJ2m=i^km/ Prm and it is obvious that Pkm converges in the 
sense under (Al), that is, Jq- Isif^KM{s,t) — P{s,t)]'^ ds dt ^ as 
K,M ^ oo. Let P = limK,M^oo Pkm- For ah (s, t) e S xT, let jkm{s, t) = 
Ef=iEm=i Wkm'4^m{s)<i)k{t)\/ pm- By (A2), the monotonically increasing net 
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of continuous real- valued functions {'yKAiis, t)} converges pointwise to a con- 
tinuous function 7(5, i). By applying Dini's theorem, the net {'yKM{s,t)} 
converges to 7(5, i) uniformly on the compact set S xT, which implies the 
uniform convergence of the right-hand side of (5). 

We denote the remainders in (Al) and (A2) as M{n),K{n) — > 00 as 



00 ^2 



(41) 



E E 

k=K{n) m=M{n) 



km 



Pn 



sup 

s,teT xS 



k=K{n)m=M(n) 



□ 



Lemma A. 3. // (A4) holds, as K,M ^00, 



(42) 



snpE[Y^M{t)-E[Y*{t)\X*] 



and for given L* and S* , 
(43) 



snpE[Y^^it)-Y*{t)Y 



0. 



Proof. To prove (42), note that 



snpE[Y^,,{t)-E[Y*{t)\X* 



00 00 2 

E E 



m=M+l k=K+l 



Pm^k teT 



sup{Xk(pl{t)}. 



From the Karhunen-Loeve theorem, we know that J2k^i ^k4'{s)4'{t) con- 
verges uniformly in s,t S T, which implies that sup^^q- Xk(t>k{t)^ converges 
to zero as ^ 00. If (A4) holds, then (42) follows. Given L* and S* , since 



sup< E 
teT [ 



E 



E E 

.AI+lK+l 



Crkm4'k{t)Cm 



Pr, 





2- 




u* 




+ E 



/ ^ ^ (Tkm(l>k{t)Q 

2^ 2^ 

\M+1 K+l 



U* 



00 00 2 



J2 Y^^snp{Xk4m^0, 



and 



E 



(00 00 
E E 
^km 
A/+1 K+l 



> 



for all t G T. 



The result (43) follows. □ 
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A.4. Proofs of the main results. 

Proof of Theorem 1. To prove (22), observing the orthonormahty of 
the eigenfunction basis, 



'${s,t) - P{s,t)Y dsdt 
K~1M-1 



T JS 



E E 

fc=l m=l 



. Pm Pm 



ds dt 



oo oo ^2 



+ E E 



km, 



k=Km=M '^'^ 

oo oo 



+ 



rJs 



E E 

k=Km=M 



Cfkn 



Pn 



■i^m{s)(t>k{t) 



■K-lM-1 

E E 

, k=l m=l 



^^i'm{s)4>k{t) - ^^1pm{s)(f)k{t) 



ds dt 



= Qi{n) + Q2in) + Q3{n). 



Then (Al) and (B5) imply that Q2{n) ^ as M{n),K{n) oo, that is, 
n^oo. By (37), (38), (40) and (B5), Slutsky's theorem imphes 



Qiin) 



/ M 

Op E 

\m=l 



6^ A 



+ E 



+ 



KM 



0, 



(44) 

as 71 oo. For Qsin), using the Cauchy-Schwarz inequaUty, Qsin)'^ < Qi{n) x 
Q2{n), thus Qsin) ^ as n ^ oo. Then (22) fohows. To prove (23), note 
that 



sup |/3(s, t) — f3{s, t)\ < sup 

s.t s,t 



E E 

k=l m=l 



^^1pm{s)(i>kit) - ^^'4^mis)4>kit) 



+ sup 



oo oo 

E E 

k=K m=M 



(^kr^ 



= QA{n) + Q5{n). 

One has Q^in) ^ as M{7i),K{n) ^ oo if (A2) holds. Observing (37), (38), 
(40) and (B5), Q4(ra) ^ as n ^ oo, leading to (23). □ 



Proof of Theorem 2. For given L* and S*, define Y^]^,j{t) = 
PY{t) + T,k=iT,m=i(^kmCm4'kit)/Prn, where Cm is defined in (10). Recall 
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Ef=iEm=iO-fcmC<^fc(t)/Pm, with ^ defined in (11). Note that 

\YKMit) - Y*{t)\ < \Y^M{t) - Y^cMm + \YKMit) - y*m- 

Lemma A. 3 imphes Y^j^,i{t) Y*{t) as K,M ^ oo and n ^ oo. Applying 
Theorem 1 in [31], Lemma A.l and (B5), one has supig^- |/iy (t) — /iy(t)| = 
Op(l/(VH6r)) andJC;; -Cml = Op{5^Asx/{^h\ - Asx)) asn^oo. Then 
sup^g^ \ Y^jy,j{t) — Y^j^,f{t)\ ^ as n ^ oo by Slutsky's theorem, and (24) 
follows. □ 

Proof of Theorem 3. Analogous to the proof of Theorem 4 in [31] 
with slight modifications similar to the arguments used in the proof of The- 
orem 1. □ 

Proof of Theorem 4. To prove (26), note 

EM sr^K -2 /- Y^oo 2 /„ 

m=l l^k=l ^km/ P _ ^m,k=l " km/ Pm 

^ Efc=l Em=l ^km/ Pm _ Efc=l Em=l '^km/ Pm 
EfcLlAfc J2k=l^k 

_|_ EfcLl Em=l '^fcm/ Pm _ J2m,k=l ^km/ Pm 
EfcLlAfc EfcllAfc 

^Qi{n) + Q2{n). 

Since the nonnegative series J2m k=i'^km/ Pm ^J2T=i^k < oo, we have 
Q2{n) ^0 as M,K ^ oo. Observing (37), (38), (40) and (B5), one finds 
that Qi(n) AO as n — > oo, by applying similar arguments to (44), leading 
to (26). It is obvious that (27) implies (28). To show (27), let fKhiit) = 
Em=iEM=i \crkm(yim(t>k{t)(t)t{t)\/ Pm for all t G T. By (A7) and Dini's theo- 
rem, the net {vKM{t)} converges to v{t) uniformly on the compact set T, 
which implies the uniform convergence of Rj^^it) as ET, M — > cx). Applying 
arguments similar to those used to prove (26), one obtains (27). □ 
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